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Abstract 

We explain an algorithm that approximately but efficiently assesses particular 
parity-check error-correcting codes of large, but finite, blocklength. This algo- 
rithm is based on the "renormalization-group" approach from physics: the idea 
is to continually replace an error-correcting code with a simpler error-correcting 
code that has nearly identical performance, until the code is reduced to a small 
enough size that its performance can be computed exactly. This assessment 
algorithm can be used as a subroutine in a more general algorithm to search 
for optimal error-correcting codes of specified blocklength and rate. 
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1 Introduction 

A fundamental problem in the field of information theory is the design of op- 
timal or nearly optimal error- correcting codes of given block-length and rate 
that can also be decoded practically. This problem can now be considered 
essentially solved in the small blocklength (e.g. iV < 100) and very large 
blocklength (e.g. N > 10 6 ) regimes. However, error-correcting codes that 
are used in practical situations (for example, for wireless communication) 
typically have blocklengths in an intermediate regime (around N = 2000). 
(The reason that intermediate blocklength codes are used in practice is that 
larger blocklength codes have better performance, but have a longer decod- 
ing time, so one will normally choose the code with the largest blocklength 
for which the 'lag' caused by decoding is still tolerable.) 

In the small blocklength regime, classical coding theory, as summarized 
in textbooks such as fl| , provides a panoply of codes of different blocklengths 
and rates, many of which are known to be optimal or nearly optimal. As 
long as the blocklength is small enough, these codes can be also be decoded 
practically (and optimally) using maximum-likelihood decoders. 

In the last few years, the problem of finding good codes in the very large 
blocklength regime has been essentially solved but in a very different way; 
by focusing on parity-check codes defined using sparse (generalized) parity 
check matrices ||. These kinds of codes were first introduced by Gallager 
in 1962 |J, but were not properly appreciated until recently. In the last 
eight years, however, new and improved codes defined by sparse generalized 
parity check matrices (such as turbocodes ML [5] , irregular low-density parity 
check (LDPC) codes | 0, | | 0, Kanter-Saad codes [0, [12], |, repeat- 
accumulate codes |13[, and irregular repeat-accumulate codes |TJ]]) have been 



the object of intense study. Such codes have three particularly noteworthy 
advantages. First, they can be efficiently decoded using belief propagation 
(BP) iterative decoding [15|]. Secondly, their performance can often be theo- 



retically analyzed in the infinite-blocklength limit using the density evolution 
approach |L6| . Finally, using the density evolution approach, or through sim- 
ulations, one can demonstrate that these codes are good codes, in the sense 
that in the infinite-blocklength limit, BP decoding will perfectly decode all 
message blocks that have a noise level below some threshold level, and that 
threshold level is often not too far from the Shannon limit. 

In recent years, a favored way to design new codes has thus been to 
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optimize codes for the infinite blocklength limit using density evolution, and 
to hope that a scaled-down version would still be a good code |], [| [H], [14] . 
The problem with this approach is that for N < 10 4 at least, we are still 
noticeably far from the infinite-blocklength limit. In particular, simulations 
will find many decoding failures at noise levels far below the threshold level 
predicted by infinite blocklength calculations. Furthermore, there will not 
necessarily even exist a way to scale down the codes derived from the density 
evolution approach. For example, the best known irregular LDPC codes 
at a given rate (in the iV — > oo limit) will often have variable nodes that 
should participate in hundreds or even thousands of parity checks [[TIJ , which 
obviously makes no sense if the overall number of parity checks is 100 or less. 
When one wants to make real codes of finite blocklength, one is therefore 
often forced to choose a code which is sub-optimal in the infinite-blocklength 
limit, with no theoretical guidance. 

Our goal, which is achieved by the renormalization group approach de- 
scribed here, has therefore been to develop an assessment algorithm more 
powerful than the ordinary density evolution approach, which will predict, 
at least approximately, the decoding failure rate as a function of the noise 
level for a specific code of finite blocklength. It is important to realize that for 
finite blocklengths, one does not expect perfect decoding below any particu- 
lar threshold noise level, so that to evaluate a code, one now needs a whole 
performance curve rather than a single number (the critical noise threshold) 
as might be computed in the density evolution approach. 

The outline of the rest of this paper is as follows. In the next section, we 
review the density evolution approach for the binary erasure channel, where it 
is particularly simple. We pay particular attention to codes defined on trees, 
for which the density evolution approach becomes exact. Section 3 is the 
heart of the paper, where we introduce and explain our renormalization-group 
(RG) approach. We show how it recovers exact answers for codes defined on 
trees, and give a procedure, which can be made increasingly accurate at the 
cost of more computational power to handle codes defined on graphs with 
loops. We present some numerical results comparing our RG calculations 
with simulations of realistic finite blocklength codes in section 4. In section 
5, we explain how to extend the RG approach to the Additive White Gaussian 
Noise (AWGN) channel. In section 6, we speculate on how one might use our 
RG algorithm as a sub-routine for a more general algorithm for the design 
of codes. 
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2 The density evolution approach for the bi- 
nary erasure channel (BEC) 

The density evolution approach is analytically very simple for the binary 
erasure channel. || |17] Since this approach is important background for our 
own RG approach, we will review it in this section. 



2.1 Parity check codes 

We will begin by studying linear block binary codes which can be defined 
in terms of an ordinary parity check matrix. In a parity check matrix A, 
the columns represent transmitted variable bits, while the rows define linear 
constraints between the variable bits. More specifically, the matrix A defines 
a set of valid vectors (codewords) z, such that each component of z is or 
1, and 

Az = (1) 

where we assume all multiplication and addition is modulo 2. 

If a parity check matrix has N columns and N — k rows it will represent a 
code of blocklength N and rate k/N (unless some of the rows are not linearly 
independent, in which case some of the parity checks are redundant, and the 
code will actually be of higher rate). 

For each parity check matrix, there is a corresponding Tanner graph rep- 
resentation. [18j A Tanner graph is a bipartite graph with two kinds of 
nodes: variable nodes (which we denote by circles) and check nodes (denoted 
by squares). In a Tanner graph, each check node is connected to the variable 
nodes that represent the bits involved in that check. For example, the parity 
check matrix 

/ 1 1 1 \ 
A= 101010 (2) 

V o i i o o i / 

corresponds to the Tanner graph shown in figure |I[ 

Codes represented by parity check matrices are "linear," which means 
that all the codewords are linear combinations of other codewords. There 
will be 2 k codewords, each of length N\ e.g., for the example given above, the 
codewords are 000000, 001011, 010110, 011101, 100110, 101101, 110011, 
111000. Because of the linearity property, we may use any of the codewords 
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Figure 1: Tanner graph for a simple error-correcting code. 

as a representative; throughout this paper, we will always assume that the 
all- zeros codeword is transmitted. 



2.2 Belief propagation decoding in the BEC 

The binary erasure channel is a binary input channel with three output sym- 
bols: a 0, a 1, and an erasure, which can be represented by a question mark 
?. The input symbol will pass through the channel as an erasure with proba- 
bility x and will be received correctly with probability 1 — x. It is important 
to note that the BEC never flips bits from to 1 or vice versa. If we assume 
that the all-zeros codeword is transmitted, all received words will thus consist 
entirely of zeros and erasures. 

We will assume that the receiver decodes using a belief propagation (BP) 
decoder with discrete messages. A message m ia will be sent from each variable 
node % to each check a that it participates in, with the message representing 
information about the state of the variable node. In general, the message can 
be in one of three states: 1, 0, or ?, but since we assume that the all-zeros 
state is always transmitted, we can ignore the possibility that m ia has value 
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1. 

Similarly, there will be a message m a i sent from each check node a to all 
the variable nodes % that participate in that check. These messages should be 
interpreted as directives from the check to the variable node about what state 
it should be in, based on the states of the other variable nodes participating 
in the check. The check-to-bit messages can again in principle take on the 
values 0, 1, or ?, but again only the two messages and ? will be relevant 
when the all-zeros codeword is transmitted. 

In the BP decoding algorithm for the BEC, a message m,i a from a variable 
node to a check node will be equal to a non-erasure received message (because 
such messages are always correct in the BEC), or to an erasure if all incoming 
messages are erasures. A message m ai from a check node a to a variable node 
i will be an erasure if any incoming message from another node participating 
in the check is an erasure, otherwise it will take on the value of the binary 
sum of all incoming messages from other nodes participating in the check. 

The BP decoding algorithm is an iterative algorithm. One should ini- 
tialize the messages so that all variable nodes that are not erased send out 
messages equal to the corresponding received bit, and all other messages 
are initially erasures. Iterating the BP message equations, one will even- 
tually always converge to stationary messages (convergence of BP decoding 
algorithms is guaranteed for the particularly simple BEC, but not for other 
channels). The final decoded value of any erased variable node is just the 
value of any non-erasure message coming into that node, unless there is no in- 
coming non-erasure message, in which case the BP decoding algorithm gives 
up and fails to decode that particular variable node. 

2.3 Density evolution 

We now consider the average of BP decoding over many blocks. Associated 
with each message m ia , we introduce a real number p ia which represents the 
probability that the message rrii a is an erasure. Similarly, we associate with 
each message m ai a real number q ai which represents the probability that the 
message m ai is an erasure. 

In the density evolution approach, we compute the probabilities p ia and 
q ai in a way that is exact as long as the Tanner graph representing the code 
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has no loops. We take 

Via = x Y[ q bi (3) 

beN(i)\a 

where b e N(i)\a represents all check nodes that neighbor variable node i 
except for check node a. This equation can be derived from the fact that 
for a message rriia to be an erasure, the variable node % must be erased in 
transmission, and all incoming messages from other checks must be erasures 
as well. Of course, if the incoming messages were correlated, this equation 
would not be correct, but on a Tanner graph with no loops, each incoming 
message is independent of the others. 
Similarly, we find that 

q ai = l- II ( 4 ) 

jeN(a)\i 

which can be derived (again assuming incoming messages are uncorrelated) 
from the fact that a message q a i will only be in a or 1 state if all incoming 
messages are in a or 1 state. 

The density evolution equations (f5|) and (f|) can be solved by iteration. 
A good initialization is pi a = x for all messages from variable nodes to check 
nodes and q ai = for all messages from check nodes to variable nodes, as 
long as one begins the iteration with the q a i messages. The BEC density 
evolution equations should ultimately converge (this can be guaranteed for 
codes defined on graphs without loops). One can finally compute 6j, which 
is the probability of a failure to decode at variable node i, from the formula 

h = x Y[ Qai (5) 

oeiV(i) 



2.4 Exact solution of a small code 

As mentioned, the density evolution equations (§), @, and @ should be 
exact when the code has a Tanner graph representation without loops. Let 
us work through a small example, to see how this works. We consider the 
code with parity check matrix 

A ( 1 1 \ /„\ 

A = 1 1 1 (6) 
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Figure 2: 



and a corresponding Tanner graph shown in figure |2|. 

This code has four codewords: 0000, 0011, 1101, and 1110. If the 0000 
message is transmitted, there will be sixteen possible received messages: 
0000, 000?, 0070, 00??, 0?00, and so on. The probability of receiving a 
particular message with n e erasures is x ne (l — x) ( - 4 ~ rie \ Messages might be 
partially or completely decoded by a BP decoder; for example the received 
message ?00? will be fully decoded to 0000, but the message 0??? will only 
be partially decoded to 00??, because there is not enough information to 
determine whether the transmitted codeword was actually 0000 or 0011. 

We can easily compute the exact probability that a given bit will remain 
an erasure after decoding by summing over the sixteen possible received 
messages weighted by their probabilities. For example, the first bit will only 
be decoded as an erasure if one of the following messages are received: ???0, 
??0?, or ????, so the total probability that the first bit will not be decoded 
is 2x 3 (l — x) + x 4 = 2a; 3 — x A . If we focus on the last bit instead, we find 
that it will be decoded unless one of the following messages is sent: 00??, 
0???, ?0??, ??0? or ????, so the overall probability that the fourth bit is 
not decoded will be x 2 (l — x) 2 + 3x 3 (l — x) + x A = x 2 + x 3 — a; 4 . 
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In the density evolution approach, we need to solve equations for the 
following variables: p u , p 21 , P22, P32, P42, qu, qi2, q22, 923, 924, h, b 2 , fa, and 
fa. The equations are: 

(7) 
(8) 
(9) 
(10) 

(11) 
(12) 

(13) 

(14) 

(15) 

(16) 



and 









Pn = x 








P21 = xq 22 








P22 = xq 12 








P32 = X 








7) AC) = X 








9n = P21 








912 = Pu 


q22 


= 1 - 


(1 


-P32XI -P42) 


923 


= 1 - 


(1 


-P22)(l -P42) 


924 


= 1 - 


(1 


-P22)(l -P32) 






61 


= xq n 








xqi 2 q22 






fa 


= xq 23 






fa 


= xq 24 



Solving these equations, we find 



(17) 
(18) 
(19) 
(20) 



P11 = x (21) 

P21 = 2x 2 - x 3 (22) 

P22 = x 2 (23) 

P32 = x (24) 

P42 = x (25) 

q u = 2x 2 - x 3 (26) 

912 = x (27) 

q 22 = 2x-x 2 (28) 

923 = x + x 2 - x 3 (29) 

q 24 = x + x 2 - x 3 (30) 
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and 

h = 2x 3 - x 4 (31) 

b 2 = 2x 3 - x 4 (32) 

b 3 = x 2 + x 3 - x 4 (33) 

b A = x 2 + x 3 - x 4 . (34) 

Examining the results for b\ and 64, we see that the density evolution solution 
agrees exactly with the direct approach for this code. 



2.5 The large blocklength limit 

If we assume that all local neighborhoods look identical, we can simplify the 
density evolution equations. For example, if each variable node belongs to d v 
parity checks, and each check node is attached to d c variable nodes, then we 
can take all the pi a equal to the same value p, all the q a i equal to the same 
value q, and all bi equal to the same value b. We then find 

p = X q dv ~ l (35) 

q = 1 - (1 -p)^- 1 (36) 

and 

b = xq dv (37) 

which are the density evolution equations for (d v , d c ) regular Gallager codes, 
valid in the N — > 00 limit. A regular Gallager code || is a code defined by 
a sparse random parity check matrix characterized by the restriction that 
each row has exactly d c l's in it, and each column contains exactly d v l's. 
The intuitive reason that these equations are valid in the infinite blocklength 
limit is that as iV — > 00, the size of typical loops in the Tanner graph of a 
regular Gallager code will also go to infinity, so all incoming messages to a 
node will be independent, and a regular Gallager code will behave like a code 
defined on a graph without loops. 

If we solve equations (p5|) and (|3~6"D for specific values of d v and d c , we find 
that below a critical erasure threshold x c , the solution is p = q = b = 0, which 
means that decoding is perfect. Above x c , b will have a non-zero solution, 
which correspond to decoding failures. x c is easy to determine numerically. 
For example, if d v = 3 and d c = 5, then x c ~ 0.51757. 
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These density evolution calculations can be generalized to irregular Gal- 



lager codes ||, or other codes like irregular repeat-accumulate codes |14 
which have a finite number of different classes of nodes with different neigh- 
borhoods. In this generalization, one derives a system of equations, typically 
with one equation for the messages leaving each class of node. By solving 
the system of equations, one can again find a critical threshold x c , below 
which decoding is perfect. Such codes can thus be optimized in the iV — > oo 
limit by finding the code that has maximal noise threshold x c . Simulations 
of such codes with very large blocklengths agree quite well with the density 
evolution predictions. 

Unfortunately, the density evolution approach is useless, or at least mis- 
leading, for codes with finite blocklength. One might think that one could 
solve equations (|3|) and for any finite code, and hope that ignoring the 
presence of loops one is not too important a mistake. This does not work 
out, as one can simply see by considering regular Gallager codes. Equations 
(UK @; an d (H) for a finite blocklength regular Gallager code will have ex- 
actly the same solutions as one would find in the infinite-blocklength limit, 
so one would not predict any finite-size effects. Simulations, on the other 
hand, show that the real performance of finite-blocklength regular Gallager 
codes is considerably different (and worse) than that predicted by such a 
naive approach. 



3 The renormalization group approach 
3.1 Intuition 

The basic idea behind the "real-space" renormalization group approach from 
physics [[19| is very similar to the idea behind recursion from computer sci- 



ence. To evaluate the performance of a large but finite code, we try to replace 
the code with a slightly smaller code with the same performance. In particu- 
lar, at each step in the process, we keep a Tanner graph and a set of p ia and 
q ai variables just as in the density evolution approach. We will call the com- 
bination of a Tanner graph and the p and q variables a "decorated Tanner 
graph." The heart of the RG approach is the RG transformation, by which 
we eliminate ( "renormalize away" ) one node in the decorated Tanner graph, 
and adjust the remaining values of the p and q messages so that the new code 
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has a decoding failure rate as close as possible to the old code. With each 
renormalization step, the decorated Tanner graph representing our code will 
thus shrink by one node, until it is finally small enough that the performance 
of the code can be computed exactly in an efficient way. 

We will explain all the details in the following sub-sections, but in general, 
the RG algorithm will work as follows: 

1. Choose a "target" variable node % for which we want to compute the 
decoding failure rate 6j. 

2. While the number of nodes remaining in the graph is greater than the 
number that one can comfortably handle exactly, repeatedly renormal- 
ize away nodes from the graph according to the following procedure: 

(a) Mark the "distance" of every node from the "target" node. The 
distance between two nodes is the minimal number of nodes that 
one needs to pass through on the graph to travel from one node 
to the other. 

(b) As long as there are any "leaf" (a leaf is a node which is only 
connected to one other node in the graph) check or variable nodes, 
renormalize them away. The order in which they are renormalized 
away will not matter, but for concreteness, we will renormalize 
those furthest from the "target" node first, breaking ties randomly. 

(c) Otherwise, choose a single variable node from among those fur- 
thest from the target node, that has the fewest neighboring check 
nodes, and renormalize it away. 

3. Compute bi for the remaining graph exactly. 

It should be clearly understood that the RG approach is approximate, 
and that there exists considerable freedom in the implementation. Different 
choices made in the implementation will lead to slightly different results. One 
can deal with this problem by constructing a series of systematically better 
RG approximations that should eventually converge to the exact answer. 



We shall see how this works out in our case in section |3.4j . In the physics 



literature, this type of approach is well-known; the interested reader should 



consult the book |19 | 
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3.2 The RG transformation for Tanner graphs with no 
loops 

First we consider loop-free Tanner graphs, and write down the RG trans- 
formations that are sufficient to give exact results for such codes. In later 
subsections, we will extend the RG transformations in order to obtain good 
approximate results for Tanner graphs with loops. 

We will always initialize our decorated Tanner graph such that all bi = x, 
Pia = x and all q a i = 0. Imagine that we are interested in the decoding 
failure rate bi at a specific node %. Our procedure will be to obtain bi by 
repeatedly renormalizing away nodes, other than the variable node % itself, 
that are "leaves" of the decorated Tanner graph. 

The first possibility that we need to concern ourselves with is when we 
renormalize away a "leaf" variable node % that is connected to a single check 
node a. Clearly, when the node % vanishes, pi a and q a i will also be discarded. 
We need to renormalize all the q a j variables leading out of the check a to 
other nodes j. Our formula will be 

Qaj <- 1 - (1 - Qaj)(l - Pia) (38) 

where the left arrow indicates that we replace the old value of q a j with this 
new value. Notice that each renormalization of q a j will increase its value. 

When we renormalize away a "leaf" check node a that is only connected to 
a single variable node i, we need to adjust the values of all the pn, variables 
leading to other checks b that node % is attached to. The renormalization 
group transformation will be 

Pib^-PibQai- (39) 

Notice that each renormalization of p^ will decrease its value. At the same 
time, we should also renormalize the bi as follows: 

h <- b iqai . (40) 

When only the "target" node % remains, we can just read off the current value 
of bi and that will serve as the RG prediction. 
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3.3 A small example 

The RG procedure might be easier to understand if we work through a small 
example. Recall the code defined by the parity check matrix 




p 3Z =x 



p u = x 



q n -0 b 2 = x q 22 =0 



^ ^2] ^ 2 ^ 22 

P 4 2 = X 



?24 = ° 

4 



Figure 3: Decorated Tanner graph 

Let us imagine that we would like to compute the decoding failure rate 
at the second variable node b 2 . We initialize pn = p 2 i = P22 = P32 = P42 = x, 
qn = qi2 = <?22 = ?23 = 924 = 0, and b 2 = 0. In figure ||, we show the 
decorated Tanner graph for this code. All of the variable nodes other than 
variable node 2 are leaf nodes, so we can renormalize any of them away. 
According to our general algorithm, we should renormalize away the one 
furthest from node 2, breaking ties randomly. Let's say we choose variable 
node 4. Then we discard p 42 and g 2 4 and obtain new values q 2 2 = x and 
^23 = x using equation fl3"8|). The new decorated Tanner graph is shown in 
figure £|. Let's next renormalize away variable node 3. We discard p^ and ^23 
and renormalize g 2 2 to the value 1 — (1 — x) 2 = 2x — x 2 . The new decorated 
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A,=x q u = 



Pn = x 



p 12 =X 
q n = b 2 = x q 22 =x 



j ^21 x 2 P 22 x 



3 



Figure 4: Decorated Tanner graph after renormalizing variable node 4. 



g 12 = Aj=i # 22 = 2x-x' 



p n = r p, a =x p 22 =x 

1 2 



Figure 5: Decorated Tanner graph after renormalizing variable nodes 3 and 
4. 
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q n = x b 2 =x q 22 = 2x-x 2 

Pi: : * f 22 * 

l 2 



Figure 6: Decorated Tanner graph after renormalizing variable nodes 1, 3 
and 4. 

Tanner graph is shown in figure |^. Next we renormalize away variable node 
1. We discard pn and qn and obtain the new renormalized value qu = x; 
the Tanner graph is now shown if figure || Next we renormalize away check 
node 2. We can discard p 22 and g 22 and obtain p 21 =b 2 = 2x 2 — x 3 (shown in 
figure |7].) Finally we renormalize away check node 1. We are left with only 
a single node (our original node 2) and 6 2 gets renormalized to its correct 
value 6 2 = 2x 3 — x 4 . 

This example makes it clear why the RG approach is exact for a code 
defined on a graph without loops: the RG transformations essentially recon- 
struct the density evolution equations, and we know that density evolution 
is exact for such codes. As we shall see, the advantage of the RG approach 
is that it still gives a good approximation for codes defined on graphs with 
loops. 

3.4 The RG approach for a graph with loops 

For a code defined a graph that has loops, we will eventually have to renor- 
malize away a variable node i that is not a "leaf" node. (Note that we could 
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Figure 7: Decorated Tanner graph after renormalizing variable nodes 1, 3 
and 4, and check node 2. 

also renormalize away non-leaf check nodes by defining the appropriate RG 
transformations, but we will choose instead to always renormalize away non- 
leaf variable nodes.) To do that, we first collect all the check nodes a, b, etc., 
that node i is attached to. Obviously, we will discard q a i, qu, Pia, Pib, etc. For 
any given check node attached to i (say check node a), we must also collect 
all the other variable nodes j attached to a, and renormalize the values of 
q a j. In figure || we illustrate the process of removing a non-leaf node. 

The renormalization of the q a j variable can be done to varying degrees of 
accuracy. The simplest approach would be to use equation (|38|) directly. The 
problem with this approach is that the value of pi a which is used will always 
be an over-estimate. Recall that pi a decreases with every renormalization. 
Since we are renormalizing away the zth node before it has become a leaf 
node, pia has not yet been fully renormalized, and is thus over-estimated. 

Instead of using p ia directly, we could use the value that it would have 
after we renormalized away all the checks connected to it; that is we could 
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Figure 8: Removing the non-leaf node i. The arrows indicate the q variables 
that will be renormalized result. 

replace pi a in equation with an effective pff given by 

pfa=Pia II ft*- ( 42 ) 
beN(i)\a 

On the other hand, we know that the values of the qu are under- estimates 
since they have not yet been fully renormalized either, so p^ as written above 
would also be an under-estimate. We could attempt to correct this mistake 
by going further another level: before we estimate a p^, we first re-estimate 
the qu which feed into it. Thus, we replace the p ia in equation (R^) with an 
effective p|£ given by 

Pfa=P*a It (43) 
b£N(i)\a 

where qlf is in turn given by 

5ff = l-(l-<to) II (44) 

k£N(b)\i 
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Putting all these together, we finally get the RG transformation 




Figure 9: Node i sees a local tree-like structure. 

The RG transformation ( fffi) is worth explaining in more detail. In figure 
H we illustrate the equation for where variable node % is attached to 

three checks node a, b, and c, and check node a is in turn attached to a vari- 
able node j. Check nodes b and c in turn are connected to their own variable 
nodes labeled k, I, m, and n. We would like to know the new probability 
q a j that check node a will send variable node j an erasure message, taking 
into account the information that flows through node i. We already have 
some previous accumulated probability q a j that check node a sends variable 
node j an erasure message (because of other nodes previously attached to 
a that have already been renormalized) . The new probability of an erasure 
message can be figured out from a logical argument: u m a j will be an erasure 
it was already, or if m ia is an erasure and (m^ or or are erasures) 
and (m ci or m mc or m nc are erasures)." Converting such a logical argu- 
ment into an equation for probabilities is straightforward: when we see "mi 
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and rri2 n for two statistically independent messages in a logical argument, 
it translates to (P1P2) for the corresponding probabilities, while "mi or m.2" 
translates to (1 — (1 — Pi)(l — Pt))- Converting our full logical argument for 
figure 4 into an equation for probabilities, we thus recover an example of the 



RG transformation (|45). 




Figure 10: Node i sees a local neighborhood with loops. 



We should always take our RG transformation for q a j to correspond to the 
logic of the local neighborhood around the node i that we are removing. In 
fact, the RG transformation given in equation (^) is only appropriate if the 
local neighborhood of node i is tree-like, and must be corrected if there are 



short loops in the local neighborhood. For example, in figure fLO], we illustrate 
a case where a variable node k is attached to two check nodes b and c which 
are each attached to the node i that we plan to remove. First consider the 
renormalization of q a j. Note that before check nodes b or c are renormalized, 
the probabilities pkb and pk c that variable node k sends out an erasure must 
be identical, because all renormalizations of p^ and pk c happen in tandem. 
Our logic argument for whether check node a will send variable node j an 



erasure message would thus be: 



m 



will be an erasure if it was already, or if 



[mi 



is an erasure) and ((rrikb is an erasure) or (m&i and m ci are erasures))." 
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(We have used the fact that at this stage in the renormalization process, if 
rrikb is an erasure, m kc must be as well.) Converting our logic argument into 
an RG transformation, we get 

q aj <— 1 - (1 - q a j){l -Pia(l ~ (1 -Pkb)0- - QuQci))) (46) 

The appropriate renormalizations of q^k and q ck are more complicated: 
the messages m bk and m ck are correlated because of node i, and we must 
keep track of that correlation after node i is removed. We have tried several 
relatively ad-hoc rules for assigning renormalized values to g's that all arrive 
at the same node (such as renormalizing the product qbkqck as a whole), but 
found the results to be unsatisfactory because they depended sensitively on 
the details of the rules. In general, to correctly account for the correlations 
caused by such short loops, we shall need to introduce additional variables 
beyond the q and p variables that we use here. We defer a detailed discussion 
of this complex issue to another paper ||20|1 . In this paper, we will restrict 



our examples to codes where the local structure is always tree-like and such 
short loops do not exist. 

The procedure we are describing for renormalizing a non-leaf variable 
node % can be made increasingly accurate by increasing the size of the neigh- 
borhood around the node % that is treated correctly. Naturally, as we increase 
the size of the neighborhood, we must pay for the increased accuracy with 
greater computation. We will use the following terminology: if, when renor- 
malizing the node i, we use the values of p ia directly, we will say that the 
resulting RG transformations have "depth" of one. If we first adjust the 
values of pi a by considering all the check nodes a attached to i and all the 
variable nodes k attached to those check nodes, we will say the resulting RG 
transformations (e.g. those described above) have a depth of two. If we go 
one step further and also consider the check nodes attached to the variable 
nodes k and the variable nodes attached to those check nodes, we say the 
RG transformations have a depth of three, and so on. 



3.5 Finishing the RG computation exactly 

In the RG approach, we can always renormalize nodes away until we are left 
with just our "target" node i, and then read off the decoding failure rate for 
that node b{. On the other hand, after we have renormalized away enough 
nodes, we could just as well finish the computation exactly. 
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For the purposes of describing the exact computation, we assume that 
we are given a Tanner graph of N nodes, and associated with each node % 
is an erasure probability Xj. (This is a little different from the decorated 
Tanner graph we are used to dealing with, but we shall show how to convert 
a Tanner graph into such a form.) To exactly compute the decoding failure 
rate of a given node i, we generate all 2 N possible received message blocks 
(ranging from the correct all-zeros message all the way to the all-erasures 
message), and decode each of them using a BP decoder. Each message block 
has a probability 



where the first product is over all nodes that are erased and the second 
product is over all nodes that are not erased. We simply compute hi by 
taking the weighted average over all received messages of the probability 
that node i decodes to an erasure. Of course, the complexity of the exact 
calculation is 0(2^), so we are restricted to small N, but nevertheless one 
can gain some accuracy by switching to an exact calculation after one has 
renormalized away enough nodes. 

The one subtlety in the exact final calculation is that one needs a Tanner 
graph and the associated erasure probabilities at each node, but in the RG 
approach, we manipulate decorated Tanner graphs. Fortunately, it is easy 
to convert a decorated Tanner graph into the appropriate form. Note that 
at each step of the RG approach, all the probabilities q a % leading out of the 
check node a must be equal (we say q a % = q a ) and all the probabilities pi a 
leading out of the variable node i will be equal (we say p ia = Pi). We can set 
all the q a probabilities equal to zero if we expand the graph by adding a new 
variable node k to node a with p ka = q a . When we are left with a decorated 
Tanner graph such that all q probabilities are zero, and all pi a probabilities 
coming out of each variable node are equal to pi, we may interpret the pi 
as the erasure probabilities of the variable nodes. In figure [TT], we give an 
example of expanding a decorated Tanner graph into an equivalent Tanner 
graph with erasure probabilities. 

3.6 Extension to generalized parity check matrices 

Many of the best modern codes, such as turbo-codes, Kanter-Saad codes, and 
repeat-accumulate codes, are easily represented in terms of generalized parity 




(47) 
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% = C ia 



x m -lb x t = p. h - q a 



m 



w 

b a 



— i 



a 

A 



Figure 11: Expanding a decorated Tanner graph into an equivalent Tanner 
graph with erasure probabilities. 

check matrices 0. In a generalized parity check matrix, additional columns 
are added to a parity check matrix which represent "hidden nodes" -state 
variables which are not transmitted. A good notation for the state variables 
is a horizontal line above the corresponding columns. For example, we would 
write 

/I 10 1 \ 

A= 10 10 10 (48) 

V o i i o o i / 

to indicate a code where the first variable node was a hidden node. To 
indicate that a variable node is a hidden node in our graphical model, we use 
an open circle rather than a filled-in circle. Such a graph, which generalizes 
Tanner graphs, is called a "Wiberg graph" |21, 22| . In figure [12], we give the 
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Wiberg graph corresponding to the code defined by the generalized parity 
check matrix (fi8f). 




The generalization of our RG procedure to handle Wiberg graphs is very 
straightforward. We initialize the probabilities Pi a coming out of a hidden 
node at 1, instead of at the erasure rate x as we do for ordinary transmitted 
variable nodes. This reflects the fact that hidden nodes are automatically 
erased, while ordinary variable nodes are only erased with probability x. 

4 Comparison with numerical simulations 

We now present a comparison of the predictions of our RG approach with 
numerical simulations. We first used a parity check matrix corresponding to 
a (3, 5) regular Gallager code with N = 60 and k = 24. That is, each of the 
36 rows in the parity check matrix had 5 entries that were ones (the rest were 
zeros), and each of the 60 columns had 3 entries which were zeros. There 
were no hidden nodes. 

We also took care to ensure that no two parity checks shared more than 
one variable node. That meant that there were no loops of length four, so we 





#6 



Figure 12: A Wiberg graph. 
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could use the RG transformation ( f4"5| ) (an RG transformation of "depth" 2) 
whenever we renormalized away a non-leaf variable node. We renormalized 
nodes away until we were left with 7 nodes, and then finished the computation 
exactly. 

We considered erasure rates x at intervals of .05 between x = and 
x = 1. When we used the RG approximation, we averaged our decoding 
failure rates bi over all 100 nodes i to get an overall bit error rate. Our 
numerical simulations consisted of 1000 trials at each erasure rate, decoded 
according to the standard BP decoding algorithm. 




0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 

Erasure Rate 



Figure 13: Simulation results compared with RG and density evolution pre- 
dictions for a small rate 2/5 60-bit blocklength regular Gallager code. 

Our results are presented in figure [I3L where we compare the simulation 
results with the prediction of our RG approach and the density evolution 
approach. As one can see, the agreement between the RG approach and 
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simulations is quite good. 

The density evolution prediction is precisely the same as it would be in 
the infinite-blocklength limit. Of course, nobody claims that the density 
evolution approach should be taken seriously for blocklengths as low as 60, 
and figure [13] shows why: the density-evolution prediction of a threshold-like 
behaviour is completely incorrect for small or medium blocklength regular 
Gallager codes. 

We then constructed, by a somewhat random procedure, a particular 
irregular Gallager code of rate 2/5 and blocklength N = 100. Each variable 
node belonged to between one and four parity checks, and each parity check 
involved between three and five variable nodes. No special effort was made to 
construct a particularly good error-correcting code, but we did ensure that 
the Tanner graph had no short loops of length four or six (counting both 
variable and check nodes). That meant that all local neighborhoods could 
be considered tree-like up to RG transformations of depth 3. 

Our procedures were the same as described for the regular Gallager code 
except that we implemented RG transformations of depth 1, 2, and 3. Our 
numerical simulations consisted of 5000 trials for all erasure rates x < .6, 
and 1000 trials for higher erasure rates. 

In figure [14], we compare the simulation results with the prediction of our 
RG approach for the bit error rate averaged over all nodes. As one can see, 
the agreement is remarkably good, especially for the RG transformations of 
depth 3. The density evolution prediction spuriously shows a quasi-threshold 
behavior around x ~ .555. 

The irregular Gallager code has interesting variation in its bit error rates 
across the different bits of the code. In figure [15] we plot the predicted (using 
depth 3 RG transformations) and simulated bit error rates for every bit in 
the code at an erasure rate of x = .55. This plot demonstrates that the 
RG approach can in fact predict the bit-by-bit variation in the bit error rate. 
Although the RG prediction is systematically slightly too high at this erasure 
rate, it captures the ordering of how easily the bits are decoded quite well. 



5 Extension to the Gaussian noise channel. 
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Random parity check code in Binary Erasure Channel 




1Q~ 9 I I I I I I I I I 

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 

Erasure Rate 

Figure 14: Simulation results compared with RG predictions using depths 
from one to three and density evolution predictions for a small rate 2/5 100- 
bit blocklength irregular Gallager code. 

5.1 Background 

In this section, we consider the extension of the RG approach to the additive 
white Gaussian noise (AWGN) channel. We will build on the Gaussian ap- 
proximation to density evolution for the AWGN channel described by Chung, 
et. al. so we first describe that approximation. 

In the AWGN channel, there are only two possible inputs, and 1, but 
the output alphabet is the set of real numbers: if x is the input, then the 
output would be y = (— l) x + z, where z is a Gaussian random variable 
with zero mean and variance a 2 . For each received bit % in the code, we can 
compute the log-likelihood ratio m- = ln(p(?/j|xj = 0)/p(yi\xi = 1)) which 
tells us the relative log-likelihood ratio that the transmitted bit i was a zero 
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Figure 15: Bit-by-bit comparison of simulation results and RG predictions 
for a small rate 2/5 100- bit blocklength irregular Gallager code at an erasure 
rate of x = .55 in the BEC. 



given the received real number is yi. 

We assume that we are again dealing with codes defined by generalized 
parity check matrices, that we always transmit the all-zeros codeword, and 
that the decoding algorithm is the sum-product belief propagation algorithm. 
In this decoding algorithm, we iteratively solve for real-valued messages: 
from variable nodes % to check nodes a; and m a i from check nodes a to 
variable nodes i. The messages m ia are log-likelihood ratios by which the 
node % informs the node a of its probability of being a or 1. For example, 
m ia — > oo means that node % is certain it should be a 0, while m ia = 1 means 
that variable node % is telling check node a that ln(p(xi = 0)/p(xi = 1)) = 1. 
The messages m a i are log-likelihood ratios which should be interpreted as 
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information from the check node a to the variable node i about what state 
node % should be in. 

In the sum-product algorithm, the messages are iteratively solved accord- 
ing to the update rules: 

ma = m bi + m °i ( 49 ) 

beN(i)\a 

(if i is a hidden node, the m' term is omitted) and 

tanh (m oi /2) = JJ tanh (m ja /2) . (50) 

jeN(a)\i 

In the density evolution approach for the AWGN channel, one considers 
the probability distributions p(m ia ) and p(m a i) for the messages where the 
probability distribution is an average over all possible received blocks. A 
distribution f(x) is called consistent if f(x) = f(—x)e x for all a; |J. Richard- 
son and Urbanke proved that the consistency condition will be preserved 
for the message probability distributions for all messages under sum-product 
decoding. If we approximate the probability distributions p(m ia ) and p{m ai ) 
as Gaussian distributions, the consistency condition means the means [i of 
these distributions will be related to the variances cr 2 by <r 2 = 2/i. That 
means that we can characterize the message probability distributions by a 
single parameter: their mean. 

Thus, by making the approximation that the message probability distri- 
butions are Gaussians, one can reduce the density evolution equations for the 
AWGN channel to self-consistent equations for the means Ui a of the proba- 
bility distributions of messages from variable nodes % to check nodes a, and 
the means v a i of the probability distributions of messages from check nodes 
a to variable nodes i. These equations are 

Via = U°i+ U bi ( 51 ) 

b£N(i)\a 

where u° is the mean value of m- (this term is omitted for hidden nodes), 
and 

<f>{u ai ) = l- J] (WM) (52) 

j£N(a)\i 
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where <f>(x) is a function defined by 

, f 



x 



u 



(u-x) z 



tanh — e du 

oo 2 



(53) 



4>(x) can be approximated in a form that reproduces the correct limits 
as x — ► and a; —>■ oo and is more convenient for numerical purposes. We 



choose 



</>(x) 



-x/A 



Vl + Px 



1 + (y/P* - 1)- 



ax 



(54) 



1 + ax_ 

This form automatically has the correct leading behavior as x — > and x — > 
oo for any a and 13. We fix a and (3 by matching the leading corrections in 
the two limits. We find a ~ 0.163489 and (3 ~ 0.634765. This approximation 
to 0(x) is quite good for all values of x. 



5.2 RG transformations for the AWGN channel 

The density evolution equations ([51]) and ([52]) for the AWGN channel under 
the Gaussian approximation are analogs of the density evolution equations 
(fD and (|3]) for the BEC channel. Our RG procedure for the AWGN channel 
will be almost exactly the same as for the BEC channel; the main difference 
is that we need to change the RG transformations. 

Just as before, we can construct a set of RG transformations which exactly 
reproduce the density evolution equations for a tree-like graph. We create a 
decorated Tanner/Wiberg graph for the code by keeping u ai and v ia variables 
between each pair of connected nodes. The u a i variables are initialized to oo, 
while the Vi a variables are initialized to u°, unless the ith node is a hidden 
node, in which case the Vi a are initialized to zero. We also introduce the 
variables hi (analogous to &j in the BEC) which are initialized like the v^ a 
variables. 

If we renormalize away a leaf check node a attached to a check node i, we 
find the other check nodes b attached to % and apply the RG transformations 

Vib <— v ib + u ai (55) 

and 

hi <— hi + u ai (56) 
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while if we renormalize away a leaf variable node i attached to a check node 
a, we find the other variable nodes j attached to a and apply the RG trans- 
formation 

0-i(l_(l_0( Uo .))(l_0( v . a ))) (57) 



U a j 

Note that with each renormalization of v^, the magnitude of vn, will increase, 
while with each renormalization of u a j, the magnitude of u a j will decrease. 

When we renormalize away a non-leaf variable node % which is attached 
to check nodes a, b, etc., we need to renormalize the variables like u a j, where 
j is another variable node attached to check node a. Just as for the BEC, 
we should consider a local neighborhood of nodes around the node i. For 
example, if no variable nodes j share two check nodes with % (there are no 
local loops of length four) then we can use the depth two RG transformation 

u aj - 0- 1 (l - (1 - 0KO)(1 - (58) 

where 

vf a =v ia + £ f 1 l-(l-« II (1-^N) (59) 

b£N(i)\a \ k€N(b)\i J 

The RG procedure proceeds as in the BEC case until the final compu- 
tation of the bit error rate. For the AWGN channel, it will not normally 
be convenient to stop the RG procedure before renormalizing all the way 
down to the "target" node, because it is not simple to do an exact compu- 
tation even with just a few nodes in the code. When we have renormalized 
all but our target node i, we will be left with a final renormalized value of 
hi. Our Gaussian approximation tells us that the probability distribution 
for the node % being decoded as a zero will be a Gaussian with mean hi and 
variance 2/ij. Decoding failures correspond to those parts of the probability 
distribution which are below zero. Thus, our theoretical prediction for the 
bit error rate at node % will be 

1 r° 



e 4h i dx. (60) 



-oo 



6 Speculations on the design of codes 

Given that the density evolution method has been used as a guide to de- 
signing the best-known practical codes, it is natural to expect that we could 
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design even better codes using the RG approach. With the RG approach, 
we can input a code defined by an arbitrary generalized parity check matrix, 
and obtain as output a prediction of the bit error rate at each node. We 
could use this output as the objective function for a guided search through 
the space of possible codes. For example, say that we would like to find a 
N = 100 rate 1/2 code with no hidden states that achieves a bit error rate 
of less than 10~ 4 at the smallest possible signal-to-noise ratio for the AWGN 
channel. We could repeatedly evaluate codes using the RG approach, and use 
any available search technique (greedy descent, simulated annealing, genetic 
algorithms, etc.) to search through the space of valid parity check matrices. 
Because we can directly focus on the correct figure of merit (the bit error rate 
itself, rather than the threshold in the infinite blocklength limit), one expects 
the search to improve on the results obtained using density evolution. 

A couple of comments are in order. First, because we have information 
about the bit error rate at every node (see figure |15|), we might be able to 
use that information to guide the search. For example, it might make sense 
to "strengthen" a variable node with a high bit error rate by adding it to 
more parity checks, or one could choose to "weaken" nodes with a low bit 
error rate by turning them into hidden nodes (thus increasing the rate). 

On the other hand, computing the bit error rate of every node will ob- 
viously slow down a search. It may be worthwhile, at least for large block- 
lengths, to restrict oneself to those codes for which there are only a small 
number of different classes of nodes (defined in terms of the local neighbor- 
hoods of the nodes). Most of the best-known codes are of this type. Rather 
than computing the bit error rate for every variable node, one could then 
compute the bit error rate for just one representative of each class of variable 
node. For example, for a regular Gallager code, each node has the same local 
neighborhood, so any node can be chosen as a representative of all the nodes. 
The error made in this approach can be estimated by comparing bit error 
rates of different nodes of the same class. For actual finite-sized regular Gal- 
lager codes, we find that the RG approach will give very similar predictions 
for each of the nodes, so that the error made by just considering a single 
variable node as a representative of all of them is quite small. 
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